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Abstract 



o 

u 

' Consider any random graph model where potential edges appear independently, with 

possibly different probabilities, and assume that the minimum expected degree is a; (Inn). 
We prove that the adjacency matrix and the Laplacian of that random graph are concen- 
, trated around the corresponding matrices of the weighted graph whose edge weights are the 

probabilities in the random model. 
' We apply this result to two different settings. In bond percolation, we show that, when- 

ever the minimum expected degree in the random model is not too small, the Laplacian 
of the percolated graph is typically close to that of the original graph. As a corollary, we 
improve upon a bound for the spectral gap of the percolated graph due to Chung and Horn. 

We also consider inhomogeneous random graphs with average degree 3> Inn. In this case 
we show that the adjacency matrix of the random graph can be approximated (in a suitable 
sense) by an integral operator defined in terms of the attachment kernel k. 
■ Our main proof tool might be of independent interest: a new concentration inequality for 

matrix martingales that generalizes Freedman's inequality for the standard scalar setting. 

1 Introduction 

Much of probabilistic combinatorics deals with questions of the following type: 

Question 1.1 Given a probability distribution over "large" combinatorial objects X and a real- 
valued parameter P = P(X) defined over such objects, does there exists a typical value P typ such 
that P{X) is very likely to be close to P typ ? 



Starting with the seminal work of Shamir and Spencer [57] on the chromatic number of 
G n , tP , many answers to instances of the above question have been obtained via concentration 
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inequalities, and developments in the two fields have often gone hand in hand; sec [46, |7| and 
the references therein for many examples. 

In this paper we introduce a new concentration inequality for random Hermitian matrices 



in order to address a variant of Question 1.1. Our combinatorial objects consist of random 



graphs with independent edges. These are random graphs where the events Hj is an edge" (with 
ij varying over all unordered pairs of vertices) are independent, but not necessarily identically 
distributed. The new twist is that the "parameters" for which we prove concentration are the 
adjacency matrix and the graph Laplacian of the resulting graph (defined in Section |2.3| ) . 

We briefly recall why these two matrices are important. Many (real- valued) parameters 
of a graph can be computed and/or estimated from these two matrices, including the diame- 
ter, distances between distinct subsets, discrepancy-like properties, path congestion, chromatic 



number and the mixing time for random walk; see e.g. [22| for a compendium of these results 



]43| , 24, ^5], for the relationship between the two matrices and "pseudo-random" properties of 
graphs and 0, [33], for algorithmic applications. Given these facts, our main Theorem (stated 
below) sheds some light on the typical properties of the corresponding random graph models. 

Theorem 1.1 (Loosely stated) Let G p be a random graph on vertex set [n] where each po- 
tential edge ij, 1 < i < j < n appears with probability p(i,j). Let A p and C p be the adjacency 
matrix and graph Laplacian of G p and Ap P and Cp P be the adjacency matrix and Laplacian of 
the weighted graph Gp P where ij has weight p(i,j) for each pair ij. Define d, A as the minimum 
and maximal weighted degrees in Gp P . Then there exists a universal constant C > such that 
ifA> Clan, 



and, if d > Chin, 



\A p — ^4^11 = O ( V A In n j with high probability 
\C P — £p yp || = O [ \/ — — ] with high probability. 




A more precise quantitative statement of Theorem LI] is given in Section |3| below. 

Theorem [O] is related to several known results about the standard Erdos-Renyi graph G n:P 
(the special case where p(i,j) = p for i ^ j). We will show in Section ^| that the kind of 
matrix concentration we prove here is implicit in the literature and that the standard notion of 
quasi-randomness for dense graphs p4| , |43| can be reformulated in terms of concentration of the 
adjacency matrix around the "typical matrix" for the corresponding G n ^ p model. There is also 
a relationship between concentration of the Laplacian and quasi-randomness for given degree 



sequences |2^, ^5|, |24|] which is briefly discussed in Section 4.1. 

For the special cases just described, the bounds obtained from Theorem 1.1 for the Laplacian 
are qualitatively sharp, in the sense that they becomes trivial at roughly the same point where 
one cannot expect concentration to hold. However, more specialized (and much more complex) 
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approaches yield improved bounds pq] . In some sense, this is due to the fact that the 

typical adjacency matrices and Laplacians for such random graph models turn out to be very 
degenerate: one of the eigenvalues of each matrix has multiplicity n — 1, and the other eigenvalue 
is well separated from the first. 

The cases where this does not happen turn out to be more interesting. For instance, consider 
the case of bond percolation with a parameter p G (0, 1) on an arbitrary n-vertex graph G. 
That is, we consider a random subgraph G p of G that is obtained by retaining each edge of 
G independently with probability p. Let A be the adjacency matrix and C be the Laplacian 
of G (respectively). We will show that when the minimum expected degree in G p is w(lnn), 
the adjacency matrix and Laplacian of G p are close to pA and C (respectively); therefore, 
any estimate for G derived from C continues to hold (at least approximately) for the random 
subgraph. A simple corollary of our Theorem is a bound for the spectral gap of G p that improves 
upon a recent result of Chung and Horn |26| ], derived via much more complicated methods. 

We then turn to the general model of inhomogeneous random graphs. These are built from 
a set of points X\, . . . , X n that are uniformly distributed over [0, 1]. The probability that i and 
j are connected in the random graph is pK,(Xi, Xj), where k : M x M — > M + is a symmetric 
function (called a kernel) and p is a parameter that controls the density of the resulting graph. 
Under some technical conditions, we will show that, for p = u;(ln n/n), the adjacency matrix of 
the random graph will correspond to a kind of discretization of an integral operator T K defined 
in terms of k. Theorem |0| takes care of the key step where we show that the adjacency matrix 
is concentrated around a deterministic matrix; the rest of the argument consists of proving that 
the latter matrix is an approximation of T K in some suitable sense. The end result implies that 
the random graph and the kernel k are close in a metric that is stronger than the cut metric 
from the literature on graph limits [16, 17, 49, 13|. Our results also imply that the eigenvalue 



distributions and the eigenvectors of the adjacency matrix of the random graph model are closely 
related to those of T K . 

1.1 A new concentration inequality 



The main result, Theorem "5A, is a straightforward consequence of a new concentration inequality 
for random matrices. Our result bounds the fluctuations Z — E [Z] of certain random d x d 
Hermitian matrices Z from their mean (defined entry wise), as measured by largest eigenvalue 
A m ax(^ — E [Z]) and the spectral norm \\Z — E \Z\ ||. 

Not much is known in general about such inequalities. This is in sharp contrast with the 
scalar case, where there are several remarkable inequalities and many techniques to prove them 
p6| , |i~9| , [?]]. The concentration results for random matrices that have been proven correspond 
to relatively old developments in the scalar case, such as the standard bounds due to Chernoff 
p0| , U and Hoeffding [39, 21|, as well as Khintchine's inequality |5C, 56]. Accordingly, the new 



3 



concentration result we introduce in this paper is a matrix analogue of Freedman's inequality 
for martingale sequences [34], which dates back to the 1970's. Here is a precise statement. 



[Measurability and conditional expectations are defined entrywise; see Section 2.4 for this and 
other definitions.] 

Theorem 1.2 (Freedman's Inequality for Matrix Martingales) Let 

= Zq, Zi, . . . , z n 

be a sequence of random dx d Hermitian matrices that forms a martingale sequence with respect 
to the filtration J-q, .Fi, . . . , T n (that is, for each 1 < i < n Zi is Ti-measurable andM [Zi \ J~i-i] = 
Zi-i). Suppose further than \\Zi — Z{-x\\ < M almost surely for each 1 < i < n and define: 



w n = j2 E i( z i- z i-i) 2 1 -Fi-i] • 
i=i 

Then for all t,a > 0: 

t 2 

P (A max (Z n ) > t, A max (Ty n ) < a 2 ) < de i^TSS. 



Compared with Freedman's original bound, Theorem 1.2 has worse constants in the exponent 



and an extra d factor (which is necessary; cf. Section ||), but the two bounds are otherwise of 



the same form. In this paper we only need a version of Theorem 1.2 for independent sums (cf. 



Remark 7.1 and Corollary 7.1), but the martingale inequality is not any harder to prove. 

The proof of Theorem 1.2 follows a methodology first proposed by Ahlswede and Winter 
||]. These authors proved a version of the Chernoff bound for matrices which has had a very 
strong impact on the development of Quantum Information Theory |3l], |3^, |60| . Christofides and 
Markstrom [^] used the same method to obtain a version of Hoeffding's inequality for matrix 
martingales. 



Theorem 1.3 (pi], in abridged form) In the setting of Theorem [7j| replace the assump- 
tion on \\Zi—Zi—i\\ by the assumption that there exist < < 1 such that X max (Zi— Zi—x) — 1 — r i 
and A max (Zj_i — Zi) < rj. Then for all t > 0, 

P(A max (Z n ) > t) < de~ nH ^^) 

where R = Y^i=x r i/ n an d f or x,r £ [0, 1] 

H r (x) = a; In ( — J + (1 — x) In 



P J V 1 — P 
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As we will see in Remark 3.1, this bound would not suffice for our applications. Roughly 
speaking, our Theorem is better because the variance term in our bound is the largest value of a 
sum of matrices, not the sum of the largest eigenvalues. In that respect, Theorem IO is closer to 



an influential bound obtained by Rudelson [56] via certain inequalities from non-commutative 
probability (j50|. The Ahlswede- Winter approach we adopt here has the advantage of requiring 
no such unfamiliar tools. 1 



Theorem |1.2| should also be contrasted with other ways for controlling eigenvalues and eigen- 
vectors of random matrices. One of them is the "trace method" [^6, 28, 38, 35, 36 which consists 
of analyzing traces of high powers of the matrices under consideration. This method can be very 
sharp, but it is also quite complex and we will see that we obtain better bounds in one context 
(but not all contexts) where the trace method has been applied. A more recent way of bounding 
eigenvalues and eigenvectors is in some sense based on bounding "discrepancies" [33, |2{|. This 
is better than our bound when the technique applies (see e.g. the comments in Section p~l] ), but 
our main applications seem to be beyond the reach of this methodology. 

Finally, we note that our result is not quite comparable concentration bounds of Alon, 
Krivelevich and Vu || for the largest eigenvalues of a random symmetric matrix. Our bound is 
poorer than theirs when applied to k-th largest eigenvalue for any fixed k, but their bound quickly 
deteriorates when k grows, whereas our result bounds the maximal deviation of all eigenvalues 
simultaneously (cf. Corollary |3~l] ), as well as the deviation of eigenspaces (cf. Corollary |3.2| ), 
Moreover, their result cannot be used to determine the typical value of each eigenvalue. 



1.2 Organization 

The remainder of the paper is organized as follows. After the preliminary Section ^, we prove the 
main concentration result in Section ||. As a test case, we apply our results to the Erdos-Renyi 
random graph Section |] where the connection with quasi-randomness is also discussed. Bond 
percolation is discussed in Section ||. The more complicated case of inhomogeneous random 
graphs is treated in Section ^, where we also compare our results to what is known about graph 
limits. The new concentration inequality is proven in Section [7|. Some final remarks are made 
in Section ^. The Appendix contains two simple results on the perturbation theory of compact 
operators for which we did not find adequate references. 

1 There is now a proof of Rudelson's bound along the lines of the Ahlswede- Winter method; see |53| for details 
and for further discussion on the difference between the three bounds. 
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2 Preliminaries 



2.1 Matrix notation 

The space of d r x d c matrices with real (resp. complex) entries will be denoted by M. drXdc (resp. 
C drXdc ). Moreover, for A £ C drXdc , A* £ £d c xd r j g ^ e con j U g a t e transpose of A. We will 
identify R d (resp. C d ) with the space M. dxl (resp. C dxl ) of column vectors, so that the inner 
product of v,w £ M. d is w*v. \\ ■ || denotes both the Eucliean norm on R d or R rf and the spectral 
radius norm induced on C dxd ' : 

\\A\\= sup \\Av\\,A £ C dxd '. 

v€C d ,\\v\\=l 

^-Herm is tne s P ace oidxd Hermitian matrices, which are the A £ C dxd with A* = A. M^ r d m 
is similarly defined; one could of course speak of symmetric matrices in this case and use 
instead of A* , but we will keep notation consistent. 

The spectral theorem implies that for any A £ C^^ m there exist real numbers Xq(A) < 
Ai(A) < • • • < Xd-i(A) and orthonormal vectors V'O) . . . ,tpd-i (the eigenvalues and eigenvectors 
of A, respectively) with: 

d-l 



A^Y. X ^ A )^i- 



i=0 



The spectrum of A is the set spec(A) of all \i(A). The above formula implies that for A £ C H g rm : 

II All = max |Aj(A)| = max v*Av. 

o<i<d-i v &c d , |MI=i 

For A £ Kjjcrm' ^ ne eigenvectors of A are all real and one only needs to maximize over v £ M d 
in the above formula to compute ||A||. 

We also note an equivalent statement of the spectral theorem as: 

A= ^2 an «' 

aSspec(A) 

where the {n Q } agspec (^) are projections with orthogonal ranges and ^agspec(A) n Q = Id, the 
d x d identity matrix. The multiplicity of a £ spec(A) is the dimension of the range of the 
corresponding IIq,; this is equal to the number ofO<f<<i — 1 with Aj(A) = a. 

2.2 Integral operators on L 2 ([0,1]) and spectral theory 

In Section ^ we will compare adjacency matrices with certain integral operators on L 2 ([0, 1]). 
The spectral theory of these and other compact operators is a classical topic in Functional 
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Analysis and we refer to 55, 45] for all the results we review in this Section. 

We will work with the space L 2 ([0, 1]) of real measurable functions that are square- integr able 
with respect to Lebesgue measure. This space has a natural inner product 

(f,9)v= f f(x)g(x)dx (f,geL 2 ([0,1])) 
Jo 

and an associated norm ||/|| 2 2 = (/> /)l 2 with respect to which it is a real Hilbert space. 

Given a function r] 6 L 2 ([0, l] 2 ) (the latter space being defined similarly to L 2 ([0, 1])), one 
can define a linear operator on L 2 ([0, 1]) by the formula: 

r,:/(.)€L 2 ([o,i])^(r„/)(.)= f v(-,y)f(y)dy. (2.1) 

Jo 

The "L 2 — > L 2 " norm of a linear operator V from L 2 ([0, 1]) to itself is given by: 

\\V\\ L 2^L2 = SUP -rrjV, • 

/eL 2 ([o,i])\{o} II/IIl 2 
It is an exercise to show via the Cauchy Schwartz inequality that: 

||TJ 2 2 ^ L2 < / r] 2 (x,y)dxdy. (2.2) 
J [0,1] 2 

Moreover, if rf : [0, l] 2 — >• R is also square integrable, T v — equals T n _^ . 

Assume that rj(x,y) = rj(y,x) for almost every (x,y) £ [0,1] (i.e. rj is symmetric). In that 
case the operator T v is a compact, self adjoint linear operator on the Hilbert space L 2 ([0, 1]). 

Let us recall what these properties imply. Let T be a bounded, compact, self-adjoint operator 
on the Hilbert space T~L. Then there exists a finite or countable set Sci and a family {P a : 
a G S} of orthogonal projection operators on % with orthogonal ranges such that: 

T = a Pa and Id% = identity operator onH = P a . 

Moreover, either S is finite and contains 0, or S is a countable, bounded subset of K with as its 
only accumulation point. Finally, all P a for are finitely dimensional; the multiplicity of a 

is precisely the dimension of the range of P a . The spectrum of T is the set spec(T T? ) = S U {0}. 

2.3 Concepts from Graph Theory 

For our purposes a graph G = (V,E) consists of a finite set V of vertices and a set E of edges, 
which are subsets of size 1 (loops) or 2 of V (we do not allow for parallel edges) . Unless otherwise 
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noted, we will assume that V = [n] for some integer n > 2, where [n] = {1,2, .. . , n}. We will 
write edges as pairs ij (allowing for i = j), but we make no distinction between ij and ji. We 
will also write i ~g j to mean that ij G £7. The degree do(i) of a vertex i is the number of 
1 < j < n such that i j G £7. 

Assume that V = [re]. The adjacency matrix of G is the re x re matrix yl = such that, for 
all 1 < i, j < re, the (z, j')-th entry of ^4 is 1 if ij G £" and otherwise. The Laplacian C = Cq of 
G is the matrix: 

C-G = In - Tq A G T G 

where T is the n x n diagonal matrix whose (i, i)-ih entry is dcii)" 1 ^ 2 if da{i) / 0, or if 
d G (i) = 0. We also let 

A(G) = min{A 1 (£),2-A d _ 1 (/:)} 

denote the spectral gap of G. 

We will also consider weighted graphs, which correspond to a graph H = (V',E r ) where 
a positive weight w e > is assigned to each edge e G E. This is the same as defining a 
symmetric function w : (T^') 2 — >■ [0, +oo) (i.e. w(i,j) = w(j,i) > for all i,j G V) and setting 
^' = {{^i} : w (hj) > 0}. In this case, the degree of i G V is defined as 

i=i 

Assume V' = [m]. The adjacency matrix of such an H is the m x m matrix where for 
each 1 < i, j < m the (i,j)-ih entry of is w(i,j). The Laplacian is defined as 

£h = I m — ThAhTh, 

where Th is defined as before, but with the new notion of degree. The definition of X(H) is the 
same as for unweighted graphs. 

2.4 Probability with matrices 

We will be dealing with random Hermitian matrices throughout the paper. Following common 
practice, we will always assume that we have a probability space (f^J 7 , P) in the background 
where all random variables are defined. 

Call a map X : Q — > C^ r d m a random d x d Hermitian matrix (or a Cy^ m -valued random 
variable) if for each 1 < i,j < re, the function X(i,j) : — > <C dxd corresponding to the (i,j)-th 
entry of X is .F-measurable. We say that X is integrable if all of these entries are and let E [X] 
be the matrix whose (i, j)-th entry is E [X(i,j)]. Conditional expectations with respect to a sub 
u-field are also defined entrywise. 
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If the entries are also square-integrable, one can define the variance by the usual formula, 



V (X) = E [(X -E[X]) 2 ] . 

The standard identity V (X) = E [X 2 ] — E [X] 2 also holds in this setting. 

We will need two easily checked properties of matrix (conditional) expectations, valid for all 
integrable random d x d Hermitian matrices X and Y and any sub-cr-field Q C T: 

[Tr and E [. . . ] commute] Tr(E \X\) = E [Tr(JT)] . (2.3) 
[Conditioning] If Y is ^-measurable, E [XY \Q\ = E [X \ Q] Y 

and E [XY] = E [E [X \ Q] Y] . (2.4) 



3 Concentration of graph matrices 

In this section we state and prove our main result, Theorem |1.1| . 

Given n £ N\{0, 1}, let p : [n] 2 — > [0, 1] be symmetric: p(i,j) = p(j,i) for all 1 < i,j < n. 
Define independent 0/1 random variables {Iij : 1 < i < j < n} with 

F(I ij = l) = l-F(I ij = 0)=p(i,j). 

We also define Iji = ly for j > i. 

Define a random unweighted graph G p with vertex set [n] and edge set 

E p = {ij : l<i<j< n, 7y = 1}. 

Let A p and C p be the adjacency matrix and Laplacian of the graph G p . We will compare 
these to the corresponding matrices Ap P , >C P yp of the weighted graph Gp 713 defined by the function 
P- 

The following is a more precise statement of Theorem |1.1| . 

Theorem 3.1 (Existence of typical graph matrices) For any constant c > there exists 
another constant C = C(c) > 0, independent of n or p, such that the following holds. Let 
d = min,j e j rt ] d G t yP (i), A = max ie [ n ] d G t yP (i). If A > Clnn, then for all n~ c < 5 < 1/2, 

P (\\A p - A^W < 4 s/~A ln(n/<J)) > 1 - 8. 
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Moreover, if d > Clnn, then for the same range of 5: 



We will quickly derive some corollaries before we prove Theorem 3.1. 

Let B\,B 2 € ^Hcrm- Standard eigenvalue interlacing inequalities imply: 

max \\{B X ) - \{B 2 )\ < \\Bt - B 2 \\. (3.1) 
ie{0,...,n-l} 

This immediately implies that: 



Corollary 3.1 In the setting of Theorem 3.1, 



\\A V -A%*\\ < 4 y/A ln(n/6) VO < % < n - 1, |Ai(A p ) - Xi(A^)\ <i\/A ln(n/<5). 
Therefore, the RHS holds with probability > 1 — 5 for any n~ c < 5 < 1/2 if A > Clnn. Similarly, 



IIA> ~ CTW < 14 {^f± VO < i < n - 1, |A l( £ p ) - A i( £^)| < 14 
and the RHS holds with probability > 1 — 5 for all 5 as above if d > Clnn. 

Now consider some B G I^Herm an d> for a < b real, let Yi a ^{B) be the orthogonal projector 
onto the space spanned by the eigenvectors of B corresponding to eigenvalues in [a, b]. The 



following corollary is a consequence of Lemma A. 2 in the Appendix, as all operators on a finite- 
dimensional Hilbert space are compact. 

Corollary 3.2 Given some 7 > 0, let N 1 (A t p P ) be the set of all pairs a < b such a + 7 < b — 7 
and Ap P has no eigenvalues in (a — 7, a + 7) U (b — 7, b + 7) . Then for 7 > 4 ln(n/<5) ; 



\A p - A^ll < 4 y/A ln(n/J) 



4(6 - a + 27) 
2 — 7^ A ln(n 

In particular, the RHS holds with probability > 1 — 5 for any n~ c < 5 < 1/2 



V(a,6) e iV 7 « p ), ||n a , 6 (A p ) -n a , 6 (^)|| < — -i ^ ; x/A ln(n/*). 

\ vr(7 2 -7VA ln(n/<5)) / 
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Define N^(Cp P ) similarly. Then for 7 > 14y / ln(4n/5)/d, 



14(6 - a + 2 7 ) \ /ln(4n/<5) 



v(a, 6) e iv 7 (^ yp ) ||n a , 6 (£ p ) - n a>6 (£^)|| < _ 

l ^ (72 _ 7 ^Mpi )y / V d 

In particular, the RHS holds with probability > 1 — 5 for any n~ c < 5 < 1/2. 

The upshot is that for any range of eigenvalues of A l p P (resp. Cp P ) that are well-separated 
from the rest of the spectrum, the projection onto the corresponding eigenvectors of A (resp. C) 
will be typically close to that of A l p P (resp. £^ p ) 2 . We will see when dealing with inhomogeneous 
random graphs that the separation conditions demanded by the corollary are satisfied in non- 
trivial cases. 

3.1 Proof of the concentration result 

We now prove Theorem tO. 



Proof: [of Theorem 3.1] Let {ej}™ =1 be the canonical basis for R n . For each 1 < i,j < n, define 
a corresponding matrix Aif 

A H = \ e i + Gie| ' l ^ T : (3-2) 
[ e^e*, 1 = 3. 

One can check that A p = J2i<i<j< n A? A v and A p P = 52i<i<j< n P(.hj) A ij- Therefore, 
A p - A^ p = ^ X ii where X v = ( T v - P(*> J')) A v> l <i<3 < n - 

l<i<j<n 



We wish to apply Theorem |1.2| (or rather, Corollary 7T in Section [?]) to the above sum. To do 



this, we first notice that the random matrices X^, which take values in C^*™ m , are independent 
(since the Iij are) and have mean zero (since E [iy] = p(i,j))- Moreover, 

1 1 1 1 < 1 1 A ij || = 1 

as the eigenvalues of Aij are always contained in the set {1,0,-1} . Thus the assumptions of 
the Corollary apply with M = 1, but we still need to compute the sum of the variances. For 



2 Of course, there is not much one can do near eigenvalue degeneracies, where eigenvectors are typically unstable. 
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this, fix some pair ij and note that: 



E [Xl] = E - p(i, j)) 2 4] = p(i,i)(l - p(i,i))A : 



and a computation reveals that 



A 2 



Therefore, 



i = j. 



(3.3) 



J^E [Xl] = ^p(t,i)(l - pMHe* + J^p(i,j)(l - p(i,i))(e ie * + e^e* 



ra / n 



= E Ep(m1(1-p(m')) 

i=i \i=i / 

This is a diagonal matrix and its largest eigenvalue is at most 



m Ff y^Pihj)^- ~ P(h3)) < max Vp(i,j) = A 



One can now apply Corollary 7.1 with <r 2 = A and M = 1 to obtain: 

Vi > 0, P(p p -A^ p || >i) <2ne"^T4t. 



(3.4) 



Now let c > be given and assume n c < 5 < 1/2. Then it is clear that there exists a 
C = C(c) independent of n and p such that whenever A > Chin, 



t = 4 VAln(2n/5) < 2A. 



Plugging this t into (0) yields: 



, \ t 2 16Aln(2n/i5) 

|A P — A^ P || > 4 VAln(2n/5)j < 2ne~TeA = 2ne ibs = 5. 



This proves the first inequality in Theorem 3.1 



In order to prove the second inequality, we again fix n~ c < 5 < 1/2. Our first task is to 
control the vertex degrees in G p . Notice that for each 1 < i < n, dc p (i) = Y^j=ihi is a sum 
of independent indicator random variables and the mean of that sum is d G t yP (i) > d. Standard 
Chernoff bounds @ (or the case d = 1 of our own Corollary 7.1 !) imply that there exists a value 
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of C = C (c) such that for d > C In n 
Vi G [n], I 



d G p (i) 



d r typ (i) 



>4„« <i/2n. 



Thus with probability > 1 — (5/2 one has that 



Vi G [n], 



<*Gp(») 



rff-typ (i) 



< 4 



ln(4n/<5) 



d 



(3.5) 



and 



We will use this inequality to compare the matrices 

T = diagonal with dc p (i)~ 1 ^ 2 at the (i,i)th position 

T typ = diagonal with d G t yP (i) -1 / 2 at the (i,i)th position 
By increasing C if necessary (and recalling that 5 > n~ c , d > Clnn), we can ensure that 



the RHS of (3.5) is at most 3/4. By the Mean Value Theorem for any x G [—3/4, 3/4]: 



1 



l " /rT ^- 11 - sup V7rT% 

\9G [-3/4,3/4] 2V1 + 



X = X. 



Applying this to 



yields that: 



x 



d c t yP (i) 



\TT t 



-l 
typ 



max 

Ki<n 



v , -i 



| J ln(4n/J) w , th probabmty > 1 _ 6 / 2 . 



(3.6) 



We now wish compare £ p = I — TA p T to £p' p = I — T typ Ap P T typ . Introduce an intermediate 
operator: 

M = I - T typ A p T typ . (3.7) 

A calculation reveals that: 



M = I-{TT-l){I-C v ){TT-l) 
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The spectrum of any Laplacian lies in [0,2] j22|; this implies \\I — C p \\ < 1. Using this in 
conjunction with ( |3.6[ ) yields: 



\\M-C P \\ = ||(rT t yp)(7 - £ p )(TT t yp) - (I - C p ) 
< \\{TT- y l-I){I-Cp){TT- y l)\\ 
+ - Cp){TT- y l)\\ 
(use "||ABC|| < IIAHIIBIIIICII") < \\TT^ - I\\ \\I - £ p \\ \\TT^\\ 

+ \\I-C p \\\\TT t -l-I\\ 



< W ^g^ with probability > 1 - (5/2, 

where again we increase C if necessary to ensure that d > C In n and 5 > n~ c imply the desired 
bound. 

To finish the proof, we must show that \\M — £p yp || < 4 i^/ln(4n/<5)/d with probability 
> 1 — 5/2. For this we will use the concentration result, Corollary |7.1|. One can write: 



£p yp - M ~y] TtypXijTtyp 

where the Xij are the same matrices from the first part of the proof (cf. ([O]))- Again we have 
a sum of mean-0 independent random matrices, in this case: 

A- ■ 

= TtypX^Ttyp = (lij - p(i,j)) , = with Aij as in (U). 

^Jd G ty P {i)d G ty P (j) 

In all possible cases, the eigenvalues of Yij are contained in the set: 

±(l-p(i,j)) ±p(i,j) 



^/d G typ(i)d G typ(j) y/d G typ(z)d G typ(j) 



and therefore 



\Yij\\ < 1/ ^/dgtyp (i)dgtyp (j) < l/d. 
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The sum of variances is: 

£E[lg| = £l[(%-p(*,j)) ?1 ' ^ 



(useQ) = X)p(i,i)(l-p(t,j)) 



^/d G t py p(i)d G t pyP (i)^ 
e « e i + e i e i 



. d G typ(i)d G typ(j) 



+ ^P0M)(1 ~ P(*'*)) 



e,-ej 



dgtyp (i) 



2 



y- 1 / P(^j')(l -P(^j) ) 
Again we have a diagonal matrix. Its (i, i)-th entry is at most: 

d G t p y P (i) d J d' 



e,e„ 



We may thus apply Corollary 7A to - with M = a = 1/d to obtain 



P(||£p yp — || > t) < 2ne"s+4t. 

To finish the proof, we take: 

, /ln(4n/J) 



d 

We have already ensured that t < 3/4 < 2. This implies 



P f ||£JP > 4^ ln(4 ;^ j < 2ne- Hi ^ < \. 

This was precisely the required bound. □ 

Remark 3.1 (Comparing concentration bounds) We now explain why the Hoeffding bound 



of Christofides and Markstrom [21] is insufficient for our purposes. In the case of the adjacency 
matrix, the random sum we deal with is ~ P(hj))^ij ■ We observed above that Ajj has 

eigenvalues 1, —1 and 0, hence we would have to take T{ = 1/2 in order to apply Theorem 
to (A p — Ap P )/2. A simple calculation shows that the exponent in that bound would be of the 
order —t 2 /(%) for small enough t, which is much worse than the —t 2 /A behavior we obtain. 
Our improvement comes from the fact that our "variance" term is the largest eigenvalue of a 
sum, not the sum of largest eigenvalues. Similar comments apply to the concentration of the 
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Laplacian. 



4 The Erdos-Renyi graph and quasi-randomness 



As a first illustration of Theorem 3.1, we apply our results to the Erdos-Renyi graphs. Our 
bounds are suboptimal in this very special case, but the stronger results in |3^, |33j require 
more difficult arguments that do not seem to generalize to other cases of bond percolation (cf. 
Section |5|). Moreover, our result correctly predicts the range of p for which one can expect 
concentration of the adjacency matrix. 

We then connect concentration to the theory of quasi-randomness for dense graphs (2j| 
showing that, in a certain sense, quasi-randomness is equivalent to concentration of the adjacency 
matrix. 

While we will not dwell on this point, a similar connection could be presented between 
random graphs with given expected degrees |28| and concentration of the Laplacian. Our bounds 
are also suboptimal in this setting, as attested by a recent preprint of Coja-Oghlan and Lanka 
29| - 



4.1 Concentration for the Erdos-Renyi graph 

For < p < 1, the Erdos-Renyi graph [||, [5J is the special case of the model G p in 

Section || where p(i,j) = p for i ^ j and p(i, i) = for i = j. Notice that in this case 
Ap yp = p(l n l* - I n ) where l n e W a is the all -ones vector and I n is the n x n identity matrix. 
Moreover, Cp P = I n — l n l n /n 



The following result is immediate from Theorem 3.1 



Proposition 4.1 There exists C > such that for all n G N, n~ 2 < 6 < 1/2 and p 6 (0, 1) with 
p(n — 1) > Chin, if A n ^ p be the adjacency matrix and C n ^ p the Laplacian of the Erdos-Renyi 
graph G n>p , then 

A n:P - p(l n l* - J n )|| < 4 ^pin - 1) ]n(n/5)) >l-6 




This result is qualitatively sharp in the sense that one cannot expect that the Laplacian 
concentrates when pn <C Inn. To see this, recall that the multiplicity of in the spectrum of 
Cn tP is the number of connected components of G„ iP (this is a deterministic statement; cf. p^| ). 
If pn < Inn, the probability of there being 2 or more components is bounded away from |J. 
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But if has multiplicity > 2, ( |3.1| ) implies that \\C n ,p — (J n — l n l*/n)|| > 1, therefore C ntP is 
far from the "typical Laplacian" with positive probability. 



Quantitatively, the bounds in Proposition 4.1 can be improved. We quickly sketch the 



argument for the adjacency matrix, which is implicit in the work of Feige and Ofek |j33| l. A key 
idea is that, since the typical adjacency matrix p(l n l* — I n ) has one very large eigenvalue and 
lots of small ones, the same should hold for A n ^ p . 



One can use the reasoning in [33, Lemma 2.1] to show that, for pn = £1 (Inn) the dominant 
eigenvector of A n ^ p is always close to l n / \fn. Moreover, the largest eigenvalue is pn + O {^/pn) 



and all other are of the order O (^/pn) [j33| , |3q| . This shows that, with probability > 1 — 5/2 

\\A n>p - pl„l* || = O (Vpn) , 

and this results in 

\\A ntP - p(l n ln - In)\\ = O {y/pn) with probability > 1 - 5 
because ||p/ n || = O (1). 

4.2 Quasi-randomness as concentration of the adjacency matrix 

We now point out that the idea of concentration of the adjacency matrix is implicit in the theory 
of dense quasi-random graphs 

This theory was initiated by Chung, Graham and Wilson |^3| . Their surprising discovery 
was that several properties that a Erdos-Renyi random graph is very likely to have are in fact 
equivalent. 

More precisely, let {Gm}m&n be a sequence of graphs, each G m having n m vertices and 
adjacency matrix A m . Assume that n m — > +oo when m —> +oo and that p > is fixed. The 



following statements (among others) are equivalent [23, |43|| . [The asymptotic notation refers to 
m — > +oo.] 

• [Ql] There exists a s > 4 such that for all < k < (2), G m contains more than p~ k (l — 
p)^~ k n s m induced labeled copies of each graph on s vertices and k edges. 

• [Q2] G m has > (1 + o(l))pn 2 m /2 edges and < (1 + o(l))(pn m ) 4 labeled copies of the 
four-cycle C4. 



[Q3] G m has > (1 + o(l))pn^/2 edges, the largest eigenvalue of A m is (1 + o(l))pn and 
all other eigenvalues of A m are o (n) in absolute value. 

[Q4] maxs c y m \e(S) — p|5'| 2 /2| = o (n^) where e(S) is the number of edges of G m inside 
S and V m is the vertex set of G m . 
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We now provide a characterization of quasi-randomness in terms of "concentration" of the 
adjacency matrix. Let 

A-m = P (ln m ln m ~~ ^n m )i 

where l„ m G W" m is (again) the all-ones vector and I nm is the n m x n m adjacency matrix. This 



is the same matrix that appears in Proposition 4.1 



The following result shows that a sequence of graphs is quasi-random if and only if the 
adjacency matrices of the graphs are sufficiently close to Am P ■ 

Proposition 4.2 A sequence {G m } m of graphs as above satisfies properties [Q1]-[Q4] above if 
and only if: 

[Pl]\\A m -A^\\=o(n). 



Proof: [of Proposition 4.2 We will show that [PI] is equivalent to [Q3] in the previous list. 



[P1]=^[Q3] : The eigenvalues of Am P are p(n m — 1) (with multiplicity 1) and — p (with 
multiplicity n m — 1). 



We use inequality (|3.1|) above to deduce that: 

|A n _i(^ m ) -pn m \ = |A n _i(An) - A n _i(A^ p )| +0(1) = o{n m ) 
and for < i < n m — 2: 

\Xi(A m )\ = \\i(A m ) + p\ + O (1) = \\i{A m ) - *i(Ato>)\ + o (1) = o (n m ) . 
Moreover, the number of edges in G m is: 

-1*41 > 1* 4*y p 1 — --\*(A — A^PM 

2 L n m /imLn ™. — ^nm^m x n m „ V m m ) 1 

pn m (n m -l) 2 

[Q3]^-[P1]: It is immediate from [Q3] that ,4 m is o (n)-close to a rank-one operator: if 
V'max is the (normalized) eigenvector corresponding to the largest eigenvalue A max (^4m), then: 

— A max (>l m ) V'maxV'maxll = max |Aj(^4 m )| = o (n m ) . 

0<i<n,„-l 

By [Q3] we also know that: 

|| A max (^4 m ) V'max V'max maxV'maxll — |A ma x(^4m) P^m\ — O (fl) . 
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It is shown in the proof of Fact 7 in 22] that, under [Q3], ip mSLX is o (l)-close to l„ m /^/n m . Thus 
we see that: 

\\pn m </>maxV4ax ~ V^n m K m II = ( n m) ■ 

Finally, we notice that 



hence 



||^p- P i nm v n j = o(i). 

Putting all the inequalities together implies the desired result. □ 

5 Application to bond percolation 

In the previous section we discussed a random graph model where the typical Laplacian and 
adjacency matrices had one "special" eigenvalue with multiplicity 1 and n — 1 "trivial" eigenval- 
ues. In this setting, proving concentration of the adjacency matrix (say) essentially amounted to 
showing that one eigenvector was close to what it should be while the other eigenvalues clustered 
around the degenerate eigenvalue of the typical case. 

We now consider a class of models for which one cannot expect this strategy to work. Let 
p £ (0, 1) and G = (V, E) be an arbitrary unweighted graph on vertex set V = [n]. Consider the 
random subgraph G p of G that is obtained via by deleting each edge of G independently with 
probability 1 — p. This model of bond percolation has received much attention in recent years, 
with a special focus the emergence of a giant component [[h], |7|> |, |7|, ||, [DJ. Much less seems 
to be known about the spectrum of G p [^] . 



In this section we apply our general Theorem, Theorem |3.1| , in order to answer the following 
question: how large does p need to be in order for the graph matrices to concentrate? Clearly, 
this must occur way after the percolation threshold. 

Bond percolation is a special case of the random model G p in Section [3|. To see this, one 
only needs to define: 

Pihj) = \0 if not. 
A computation shows that the "typical matrices" for this choice of p are: 

Ap P = pAc, where Aq is the adjacency matrix of G; 
£^ p = Cg, where Cq is the Laplacian of G. 
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Moreover, the parameters d, A appearing in Theorem 3.1 are pdc and pAc where do (resp. 
Ag) is the minimum (resp. maximal) degree in G. 



The following result is a direct corollary of Theorem 3.1 



Theorem 5.1 For each c > there exists a C > such that the following holds. Suppose that 
G, p and G p are as above and pdc > C Inn. Then: 



P (\\A Gp - P A G \\ < 4 VpA G ln(ra/<5) 

and 



> 1-5 



where Aq v and Cg p are the adjacency matrix and Laplacian of G p (resp.) 

One can of course derive corollaries about eigenvectors and eigenvectors following Corollaries 
3.1 and [3^. For instance, suppose that: 

7 > 14* 



'ln(4n/<5) 
pdc 



Then the following holds with probability 1 — 5: for each < i < n — 1 such that the interval 
(Xi(Ca) — 27, Xi(Cc) + 27) contains no eigenvalues of Cq other than Aj(£), Xi(CG P ) has mul- 
tiplicity 1 in the spectrum of Lq v and moreover, the corresponding normalized eigenvectors if), 
ip p of Cg and Cg p (resp.) satisfy: 



/ ln(4n/S) 

^;-w\\<- v pdG 



pd G 



with probability > 1 — 5. This implies: 



1 - ty>V P ) 2 < - 



ln(4n/a) 

4 V 



7T / ln(4n/g) 
' V P d G 

for the same eigenvectors, which implies that ip p is close to V or — V 7 - A similar result for the 
eigenspace projectors could be derived even if Xi{G) had higher multiplicity. It seems quite 
remarkable that one can approximately obtain the eigenvectors or eigenspaces of G from a 
(potentially very sparse) subgraph G p . 

We also note that the threshold for Laplacian concentration is indeed pdc = 8 (Inn), as 



shown in Section 4.1 in the special case of the Erdds-Renyi random graph G n , 



p- 
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The following simple corollary is also of interest. 



Corollary 5.1 There exist C,C > such that, ifpdo > Clnn, then with probability 1 — 1/n 2 

\X(G)-X(G P )\ < C 



Inn 



pd G 

We have singled out this bound in order to compare it with a recent bound of Chung and 
Horn J26|j. These authors proved that, with high probability, 



X(G p )>X(G)-oL ^+ (lnn)3/2 



pdc pdc (In In n) 3 / 2 J 

Our bound is better for all values of n and pdc most dramatically for Inn ^ pdc "C ln 3//2_e n, 
in which case their bound is vacuous while ours is non-trivial. 



6 Application to inhomogeneous random graphs 

In this section we consider a more complex random graph model that is defined in terms of an 
attachment kernel k, a density parameter < p < 1 and a set of points X\, X2, ■ ■ ■ , X n . 

More precisely, let k : [0, l] 2 — > R+ U {0} be a measurable function that is symmetric in the 
sense that for all x, y S [0, 1], k(x, y) = n{y, x). Pick some vector X\- n = [X\, . . . , X n ) of points 
in [0, 1]. Now consider the following weight function p : [n] 2 — > [0, 1]: 

p(i,j) =max{pK(Xi,Xj),l}, G [n] 2 . (6.1) 

One can define a random graph G p as in Section ||] with the above weight function; we call 
this graph G ntPtK , the inhomogeneous random graph on n vertices, density parameter p and 
attachment kernel k (the dependency on X\- n is implicit in this nomenclature). The adjacency 
matrix of this random graph will be denoted by A UtPtK 

Our goal in this section will be to prove that, up to some error terms that are small with high 
probability, the adjacency matrix A n ^ KtP /pn of G n)P)K will be related to the integral operator on 
L 2 ([0, 1]) that is defined by k. 

T K : L 2 ([0,1]) L 2 ([0,1]) 

/(•) I^(;y)f(y)dy 

Similar results for the Laplacian of G n)P)K are discussed in Section ||. 



21 



6.1 Some history of the model 

The phrase "inhomogeneous random graph" comes from a paper by Bollobas, Janson and Rior- 



dan [11] where the above model was studied in the range p = (1/n) with background spaces 
more general than [0, 1]. Their goal was to study the structure of connected components in the 
general model, in analogy with the well-known Erdos-Renyi phase transition at p = 1/n 0. 
A related random graph model generating dense graphs (p = 1) was introduced in H 



and studied in [16]. This model is related to the beautiful theory of graph limits where the 
space of graphs is "completed" into the space of graphons, which are non-negative, symmetric 
functions like k above, with the further restriction that k < 1. There is a fairly complete 
correspondence between the properties of sequences of graphs that are convergent in terms of 
normalized subgraph counts and the corresponding limiting graphon. Conversely, the sequence 
of random graphs correponding to a given graphon k converges to that same graphon. The cut 
metric that defines graph convergence will be further discussed in Section |0| below. 

The connection between the convergent graph sequences and inhomogeneous random graphs 
was noted in (l^, [T^l , where the authors studied bond percolation over a convergent sequence 
of graphs and found the critical probability for existence of a giant component. Other papers 
[13, 14] have focused on the relationship between convergence of subgraph counts vs. convergence 



in the cut metric (see below) for sparse graphs, a topic that is far from completely elucidated. 
In what follows we will show that our random graphs converge to the corresponding kernel in a 
stronger metric. 

6.2 The precise result 

We will use the following technical assumption. 

Assumption 6.1 k : [0, l] 2 — > IR+ U {0} is a symmetric measurable function with 

K= sup k(x, y) < +oo. 
(x,2/)e[o,i] 2 

Moreover, the points X\,X%,... ,X n are random i.i.d. uniform over [0, 1]. 

Let X\ < X2 < • • • < X n be the ordered sequence of the Xf, ie. X\ is the minimum of 
the Xi, X2 is the second smallest element and so on (ties are broken arbitrarily). Let a n be 
a permutation such that Xi = X a u\ for each 1 < i < n, chosen in a measurable manner. 
Associate with the graph G n)PA a symmetric, non-negative function from [0, l] 2 to U {0}: 

Gn,p,K = — / X( cr n (i)-1 OntO ] ( tr„U) -1 <Tn(j) ~|> 



P ijeE(G n:PtK ) V n " U n " J 
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where \s is the indicator function of the set S and E(G ntPiK ) is the edge set of G njPiK . Notice 
that G n ,p,K defines a bounded linear operator on L 2 ([0, 1]) via a formula similar to (|S^): 



(%,„,./)(•)= / G P ,nA;y)f(y)dy (/ e i/([o, i])). 

JO 

Let {ej}" =1 be the canonical basis of W 1 . Let us consider two linear operators (both of which 
depend on a n defined previously: 



H n : K n -> L 2 ([0,1]) 

^ = E"=i ^(») e * ^ E"=i v^V'fflx ^nw-i %[■) 



n 5 n 



gn(i) 



£ n : L 2 ([0,1]) 
f 

and note that Tg npK = H n A ntPtK E n /pn. 

Finally, let spec(T K ) be the spectrum of the operator T K in (|6.2| ) (see Section ^2 to recall 
what the spectrum is). 



Theorem 6.1 (proven in Section 6.4) There exist universal constants c, C > such that the 
following holds under Assumption Given e > 0, suppose there exists a L-Lipschitz function 
K e that also takes values in [0, K] and which is e-close to k in the L 2 ([0, l] 2 ) norm. Define: 



9 = 9(k, e, L, K, n,p) = 2e + c(L + K) 



/Inn 
V n 



1/4 Khxn 

V P n 



and assume pn > Chin and p < 1/K. Then there exists an event £ with probability P(£) > 
1 — n~ 2 such that, inside £ , the following properties hold: 

1. The n x n matrices A n „ jK and E n T K H n satisfy: 



pn 



EnT K Hf : 



< 



2. The integral operators Tg n K and T K satisfy: 



3. Given Scl, let m^ n k p / pn (S) be the sum of the multiplicities of all eigenvalues of A n>KtP 
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that lie in S and define m,T K (S) similarly. Then if mi s£ s \s\ > 0, 



m A n , K , p / P n{S) < m TK (S ) and m TK {S) < m AnKp/pn {S ) 



where S e = {x € R : 3s G S, \x - s\ < 6}. 

4- Consider each pair (0,7) where a £ spec(T K ) and 7 > 9 is such that (a — 27,0 + 27) 
contains no eigenvalue of T K other than a itself. Let P a be the orthogonal projection 
in L 2 ([0, 1]) onto the eigenspace of a in L 2 ([0, 1]) and consider the orthogonal projection 
H(a—y)pn,(a+-y)pn(A n ,p,K) ^ n C n over the span of the eigenvectors of A njPjK corresponding to 
eigenvalues in [(a — 7)pn, (a + j)pn]. Then: 



This Theorem implies that, up to error terms that are small with high probability, A n)P)K /pn 
is defined solely in terms of the kernel function k, up to a permutation of coordinates. It 
also implies that, statistical parlance, it implies that the non-zero eigenvalues of A niPiK are 
strongly consistent estimators of the non-zero eigenvalues of T K when n — > +00 and p = p(n) 
pnj Inn — > +00. 

Both of these assertions hinge on the fact that Lipschitz functions are dense in L 2 ([0, l] 2 ). 
Unfortunately, our error bounds are not independent of k, as quality of the approximation by 
Lipschitz functions, measured by the size of the Lipschitz constant for a given approximation 
error e, may vary with k. This is in contrast with approximation in the cut norm, which we now 
discuss. 

6.3 Convergence in the operator and cut metrics 
6.3.1 The cut norm and the cut metric 

Any function rj G i 1 ([0, l] 2 ) determines a bounded linear operator T v : L°°([0, 1]) — > ^ 1 ([0, 1]) 
via the formula that we already used to define T K and Tg p n K : 




tt(7-0)" 




(6.3) 



The cut norm of rj is the L°° — > L 



norm of T v : 



^7 1 1 cut = SU P 



j f\T ri f)(x)g(y)dx : f,g£ L°°([0, 1]), ||/|| 



I Jo 



} 



(6.4) 
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One can check that [[^[[cut < II^IIl 1 always. This definition of ||7?|| C ut is natural from the point 
of view of Functional Analysis; a more "combinatorial" definition, 



|?7||cut,2 = SUp 



rj(x, y) dx dy 
AxB 



A,Bc [0, 1] measurable > 



is equivalent to the previous one in the sense that: 

1. , „ „ , 

jll^llcut < IM|cut,2 < IMIcut- 

Now assume that Gi and G2 are graphs with common vertex set [n] and adjacency matrices 
Aq 1 , Aq 2 ■ Define: 



V n ' nj V n ' nj 

l<i,j<n:ih£E(G t ) 



Then one sees that: 



\\ K Gi,p Kg 2 ,p cut, 2 — 9 

is the normalized cut norm of Aq — Ah . 

Thus the cut norm on L 1 ([0, l] 2 ) induces a distance on graphs. Notice, however, that this 
distance might be positive even though G and H are isomorphic. This motivates the following 
definition: given two kernels k, k" £ -^ 1 ([0, l] 2 ) , say that k" is a rearrangement of k (k" ~ k) 
is there exists a measure-preserving bijection r : [0, 1] — > [0, 1] such that n(x,y) = k(t(x),t(?/)) 
for almost every (x, y) £ [0, l] 2 . The cut metric assigng to each pair At, k! of kernels a distance: 

dcut(ft, k') = inf{||K" - ft'Hcut : k" ~ «}■ 
Notice that the cut metric does not distinguish between (the kernels of) isomorphic graphs. 
6.3.2 The operator norm and the operator metric 

The metric d cu t yields a criterion for convergence of graph sequences. In the dense case p = (1), 
this implies the convergence of normalized subgraph counts and also gives a criterion for testable 



graph properties [49, 16]. As mentioned above, much less is understood about the case p = o (1) 



(see however the conjectures of Bollobas and Riordan [13, Section 5.2]). 

Theorem [O] is mostly concerned with the eigenvalues and the eigenvectors of the adjacency 
matrix A ntPtK . Unfortunately, in general we do not even know how to control the eigenvalues of 
An,p,K in terms of the cut norm alone. For bounded kernels (p = (1)), this is easy enough (see 
[17, Theorem 6.6]), but there are difficulties in extending this to the sparse case. This does seem 
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to be a serious problem, as related difficulties appear in [13] when the authors attempt to relate 
the convergence of subgraph counts to cut metric convergence. [Estimating the eigenvalues is 
related to counting cycles in the corresponding graph or graphon.] 

Luckily, a stronger notion of convergence implied by the 1? — > L 2 norm suffices for our 
purpose, and it is precisely this notion that we achieve via our methods. 

We need some definitions in order to properly state this. Given r/ £ L 2 ([0, l] 2 ), define a 
bounded linear operator T v from L 2 ([0, 1]) to itself via the formula in Section [2.2| ; this is the 
same as (|6.3j), except that the domain and range of are different. The operator or "L 2 — > L 2 " 
norm of t] is the L 2 — > L 2 norm of T v , also defined in Section 2.2 : 



lollop = \\Tr)\\L 2 ->L 2 - 

From ( |6~4| ) we see that that \\r]\\ op > ||??||cut whenever 7] is square- integrable. 

In analogy with the cut metric, one can also define an operator (pseudo-)metric on square- 
integrable kernels via the formula: 

d op (K, k,') = inf{||/-c" — k'|| p : k" k}. 

One can show via our results that when Assumption |6.1| holds, n>l and p 3> In n/n, then 



the kernel determined by G n:P:K - which is equivalent to G n ,p,K m Theorem 6.1 - converges in 



T n,p,K 

the d op metric to k. We omit the details. 



A drawback of d op is that it lacks a corresponding (weak or strong) regularity lemma, which 
would allow one to approximate up to error e any (say bounded) kernel k by simple functions 



taking at most m = m(e, \\k\\l°°) values. Indeed, this is precisely why the bound in Theorem 6.1 
depends on k. 



6.4 Proof of Theorem 16.1 



The proof will consist of several steps. 

6.4.1 The relationship between Tg npK and Ag npic 

For f,ge L 2 ([0,1]), define (f,g) L 2 = ft f{x)g{x)dx and ||/||| 2 = (/,/) L 2. 
The following facts can be easily checked (proof omitted). 

V/ G L 2 ([0, 1]), W> G R n , (f,H n iP) L 2 = (E n f)*iP (i.e. E n is the adjoint of H n ); (6.5) 

(j) S M n , (H n ip , H n (p) L 2 = ifi*<f> (i.e. H n is an isometry)' (6.6) 

V/ £ L 2 , \\E n f\\ < \\J\\l2 (i.e. E n has operator norm at most 1); (6-7) 
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E n H n = I n , the identity operator on W 1 ; (6.8) 

' i — 1 % 



H n E n = II n , the projection onto the span of 



n n 



n 



; and (6.9) 



i=l 

Tg n>p>K = —H n A niPjK E n , as seen above. (6.10) 

Let us now relate the non-zero eigenvalues and eigenvectors of A n ^^ with those of Tg K . 
Write: 

A n , P , K = ^2 (apn) Il a 

a:apn&pec(A n ,p,K) 

where each Il a the projection onto the eigenspace corresponding to apn. By fl6.10 ), 



Tg n , P:K = Yl »H n U a E n . 

a:apn£spec(A„ iP , K ) 

Claim 6.1 The operators H n U a E n are orthogonal projections with orthogonal ranges. There- 
fore, the non-zero eigenvalues ofTg n are the numbers a/0 with apn E spec(A n ^ tK ) . More- 
over, for each such a, H n H a E n is the projection onto the corresponding eigenspace of Tg . 

Proof: [of the Claim] First notice that for each a: 

(H n U a E n ) 2 = H n H a E n H n Il a E n = H n H a E n . 
because E n H n = I n (eqn. (|6.8|)) and 11^ = One can also check that for all /, g E L 2 ([0, 1]), 
(f,H n U a E n g) L 2 = (H n fY(U a E n g) = (U a H n f)*(E n g) = (H n U a E n f,g) L 2, 



where we used fl6.5|) for the first and third equalities and the fact that II a = II* for the second 
one. It follows that H n H a E n is a self-adjoint operator on L 2 that equals its square; this means 
that it is an orthogonal projection onto its range. 

To see that these ranges are orthogonal for distinct a, notice that the range of Hn Yl a E n is 
the set of all vectors of the form H n tp where tp belongs to the range of Yi a and is therefore an 
eigenvector of A niPiK with eigenvalue apn. But eigenvectors of A HtPtK with distinct eigenvalues 
are orthogonal, hence their images under H n are orthogonal in L 2 (by fl6.6|) ). 

The other assertions follow directly. □ 



6.4.2 The concentration argument 

Let us introduce a matrix A njP:K whose (i, j)-th entry is pn(Xi, Xj), 1 < i < j < n. Conditioning 
on the realization of the X\, . . . ,Xj, our random graph model has independent edges with 
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respective probabilities p(i,j) = pn(Xi,Xj) and A njP;K is precisely the typical adjacency matrix 
Ap P in this setting. We deduce from Theorem |3 . 1| that there exists a constant C > independent 
of n, k and X±, . . . , X n , such that if A = A(X\, . . . , X n ) is as in that Theorem and A > Clnn, 

\A n , p , K -A n , p , K \\ > 4 V / Ahi(2n 2 ) | X l ,...,X n ) j < 
In our setting we always have 

n 

A = max > pn(Xi, Xj) < Kpn 

" " 3=1 



where K is the quantity in Assumption 3.1. Therefore 



(\\A n , P)K - A n ^ K \\ > 4 V^nln(2n 2 )) < ^~ f ■ 



Let 



T = HnAn^En/pn = K(Xj,X 7 -)x/ CTw (Q-i CTn (i)i ( 6 -ll) 

l<i,j<n v J v 

Since -ff n is an isometry (by fl6.6|)) and has norm at most 1 (by (|6.7|)), 



T — Tg np K \\ L 2^ L 2 — — \\H n (A njPjK — A n>p>K )E n \\ < 4 ■ 



pn V pn 

with probability > 1 — l/2n 2 . 

6.4.3 Nearing the end of the argument 



T - T K \\ < 2e + c(L + K) (In n/n) 1 ^) > 1 - -L (6.12) 



We will show in Lemma |6.1| below that there exists a universal c > such that for any e > 

1 

Increasing c if necessary, this implies that, with probability > 1 — n~ 2 

\\ T Q n , P , K ~ t k\\l 2 ^l 2 < \\ T S„, P , K ~ t \\l 2 -+l 2 + \\ T ~ t k\\l 2 ^l 2 < 6 

for 6 as in the Theorem. This proves the second assertion in the Theorem. To prove the first 
one, first notice that, since E n H n = I n (cf. Q6.8Q ), 

1 A 

EnTg n K H n = — (E n H n )A npK (E n H n ) = — LJ -. 

pn pn 
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Now use again the fact that E n and H n have norm 1 to deduce: 



E n T K H n \\ < \\Tg n ^ K - T K \\ L 2^ L 2 < 6. 



pn 

The other two assertions follow from the perturbation lemmas provided in the Appendix. More 



precisely, recall from Claim 6.1 that the eigenvalues of Tg n K are either or equal to some o^O 
with apn £ spec^^^p). Assertion 3 follows from Lemma pO applied to Tg and T K . 

As for Assertion 4, we recall from Claim |6.1| that whenever (3pn £ spec(A njPjK ) with cor- 



responding eigenspace projection ILj the corresponding eigenspace of Tg nK is H n IipE n . This 
implies that: 

^■■nX^(a—-y)pn,(a+-y)pn {^n,p,K,)E n 

is the projection onto the eigenspaces of Tg n corresponding to eigenvalues between a — 7 and 
a + 7. One can apply Lemma [A.2 with e = 6 and b — 7 = + 7 = to deduce that, whenever 



a is as in assertion 4 and — T K \\ < 9, 

46 



\-^rJ^(a—-y)pn,(a+'y)pn{ J ^-n,p,K)E n P a \\j^2^]^2 < 



7T(7-0)' 



Multiplying both operators above by E n on the left and by H n on the right, using that H n and 
E n have norm < 1 and that E n H n = J n , we see that: 

46> 

\\^-{a—^)pn,{a+^)pn{ J ^n,p,K) ~ E n P a H n \\ < — j— ^ ^ . 

This finishes the proof modulo inequality ( |6.12 ), which is the subject of Lemma 3.1 below. 



6.4.4 Approximating T K 



Lemma 6.1 Under Assumption 6.1, suppose e > is given and K e : [0, 1] — > R+ U {0} is a 



L-Lipschitz symmetric function, with values between and K , such that 

\ / (K(x,y) - K e (x,y)) 2 dxdy < e 2 . 
Jo Jo 

Then the following holds with probability > l/2n 2 : 

Inn N ' 



\T K - T\\ L 2_, L 2 < 2e + c(L + K) 

" ' n 



where c > is universal. 
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7n(»)-l °"n(i) 1 x I "n ( 3 ) - 1 "n{j) 

l<ij<n v J v 



Proof: Define: 
We will bound: 

\\T K - T\\ L 2^ L 2 < \\T K - T Ke \\ L 2^ L 2 + \\T - f\\ L 2^ L 2 + \\T Ke - f\\ L 2^ L 2. (6.13) 



By the results in Section 2.2, one can bound the first term in the RHS by: 

\\T K -T Ke \\ 2 L2 ^ L2 = ||T K _ Ke 11^2^2 < / / (n(x,y) - K t (x,y)) 2 dxdy < e 2 . 

Jo Jo 

For the second term, we observe that T — T is of the form for 77 taking the values n(Xi,Xj] 
K e (Xi,Xj) on squares of area 1/n 2 . We deduce from the results in Section 2fl that: 



— - 1 - 

\\T-T\\ 2 L2 ^ L2 <^ ]TKX 4 ,X,0-M^^-)) 2 - (6-14) 
The expected value of the RHS is: 

/ / (K(x,y) - K e (x,y)) 2 dxdy < e 2 
Jo Jo 

Moreover, the random variables Xi are independent and replacing Xi by some other X[ £ [0, 1] 



can change the value of the sum in the RHS of Q6.14) by at most K jn (as each term is bounded 



by K and only n terms involve X{). Azuma's inequality jjj implies: 



E M X i> X i) - KeiX^Xj)) 2 > e + < e -™* 2 /2^ 



Therefore, with probability > 1 — l/4n 2 we have: 



,- , /21n(4n 2 ) „ / Inn ^ 1/4 

IV - r\\ L 2^ L 2 < \le*+K*\l { —^<e + cKl — 



where c > is some universal constant. We deduce: 



— — In 7i \ ^ 

|v K -r K j| i2 ^ L 2 + ||r-T|| i2 ^ L2 < 2e + c ^(— (6.15) 



with probability > 1 — l/4n 2 
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To finish the proof, we must bound the third term in ( 6,13j ). To do this, we notice that: 



T — T — T 



where 



l<i,j<n 



n 5 n \ n ' n 



Using the definition of a n from Section 3.2, one can rewrite this as: 



r)(%,y) = V] (K e {Xi,Xj) - K e (x,y)) X(tzl ±] x (i=l n(x,y). 



l<i,j'<n 

Recall that K e is L e -Lipschitz and therefore, 



i — 1 i 



n n 



3 ~ 1 J 



n n 

< 2L e /n + \K e (Xi,Xj) - K e (i/n,j/n)\ < 



< 2L e /n + L t \Xi - i/n\ + L e \Xj - j/n\. 

Integrating rj 2 , we find that: 

r i n _ _ 

/ ^ -2 S ( 2 V" + L ^ - + L e \Xj - j/n\f < 



X2L 2 

[use (a + b + c 2 < 3(a 2 + 6 2 + c 2 ))] < — ^ + 6L 2 max (X; - i/n) 2 . 

l<i<n 

A simple calculation using e.g. Massart's version of the Dvoretsky-Kiefer-Wolfowitz inequality 
plf reveals that the last term is < c 2 lnn/n (c > universal) with probability > 1 — l/4n 2 . We 
deduce that: 



\ T ~ t kAl 2 ^l 2 = \\ t v\\l 2 ^l 2 < 



„ 2y/3L T /61nn 
if < h cL\ 

[o,i] 2 n V n 



with probability > 1 — l/4n 2 . Combining this with ( |6.15 ) and replacing c > with a larger 
universal constant if necessary finishes the proof. □ 



7 Freedman's inequality for matrix martingales 



In this Section we prove our new concentration inequality, Theorem |1.2| . We begin with some 
preliminaries from matrix analysis. 
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7.1 Preliminaries from matrix analysis 

7.1.1 The positive semi-definite order 

Matrix inequalities for the positive semi-definite order will be essential in our proof. 

Given A G C^g^ m , say that A y if A is positive semi-definite, which is the same as saying 
that all eigenvalues of A are non-negative, or that v*Av > for all v G C d . We will also write 
A * B (for B G C^J m ) if B - A y 0. Notice that A ^ £1 for some £ G R iff A max (A) < f . 

We will need four other properties of the partial order . The first three are easily checked 
and we omit their proofs: 

The set {(A, B) G (Cg^V : A ■< B} is closed in the product topology. (7.1) 

k k 

V{Ai}* = i,{Bi}Ll C C d ^J m : "VI < t < k, Ai r< 5," "^A; ^ ^^". (7.2) 

i=l i=l 

VA, G C^ r d m : "A ^ 0" => "A max (A + B) > \ max {B)". (7.3) 
The fourth one is slightly less standard. 

VA, B,C e C^J m , (AhOAC-BhO)^ Tv(AB) < Tr(AC). (7.4) 

To prove fl7.4| ), notice that for for A,B,C as above, 

Tr(A(C - B)) = Tr((C - £) 1/2 A((7 - B) l/2 ) 

where (C- £) 1/2 G C|* r d m is the (also positive semi-definite) square root of C — B. Then notice 
that for any v G C™, 

v*(C - B) l l 2 A(C - B) x l 2 v = [(C - £) 1/2 t>]*A[(C - B) l l 2 v) > 

since A y 0. This implies that (C — B) l l 2 A(C — B) 1 / 2 must be positive semi-definite, hence its 
trace is non-negative: Tr (A(C — B)) > 0, which is equivalent to (|7.4[) by linearity. 

7.1.2 Conditional expectations are monotone 

We will also need the following property that relates expectations to the positive semi-definite 
order. Let X, Y be integrable, random dxd Hermitian matrices defined on a common probability 
space (OjJ 7 , P). Then: 

If X ^ Y almost surely, then E [X | Q] X E [Y | Q] almost surely. (7.5) 
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To see this, it suffices to see that for all v G C d , v*Xv < v*Yv and therefore E [v*Xv | Q] < 



M[v*Yv | Q\. However, our definition of E [• | Q] for matrices (cf. Section 2.4) implies that 
E [v*Xv | Q] = v*E [X\g\v and E [v*Yv \ Q\ = v*K [Y \ Q\ v. Therefore, if 

X <Y almost surely ^VvG C d , VE [X \G]v< v*E [Y \ Q]v almost surely". 

Now let Q C C d be dense and countable. Note that for all A G Cg* r d m > A < if and only if 
v*Av > for all v G Q. 

E [X | 0] r< E [Y | ^ a.s. 44> P (W G Q, v*E [X\Q\v< v*E [Y \ Q] v) = 1 

and the RHS follows from X ^ 1" by the previous implication (since Q is countable). 

7.1.3 Matrix functions and matrix exponentials 

If / : C — > C given by a power series f(x) = Y^iLi c « x * that converges for all x G C, one may 
define: 

oo 

f{A) = Y J c i A\AeC M , 

i=i 

which can be shown to converge for all A. f(A) is Hermitian whenever A G Cfjt^ and the 
coefficients Cj belong to R. In that case, the eigenvalues of f(A) are given by f(\i(A)) for 
< i < d — 1, with the same eigenvectors as A. In particular, f(A) ^ for some £ G R iff 
f{Xi{A)) < £ for each < * < d- 1. Moreover, for all s > 0, 

exp(sA max (^)) = A max (exp(s^)) < Tr(exp(svl)). (7.6) 

We need one more result from matrix analysis, called the Golden Thompson inequality. 

yd G {1, 2, 3, . . . }, VA B G C^J m : Tr(e A+B ) < Tr{e A e B ). (7.7) 

This inequality is fundamental in adapting the standard proofs of concentration to the matrix 
setting !,|1|5|. 

7.2 The proof 

We begin with two simple Lemmas. 

Lemma 7.1 For any matrix C G C^ r d m and k G N\{0, 1}, C k H \\C\\^ 2 C 2 . 
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Proof: \\C\\ k 2 - 2 C 2 - C k has the same eigenvectors as C and its eigenvalues are given by 



c\\ k 2 - 2 Xi{cf - \i(cy 



\c 



lfc-2 



Xiicf-^ucy 



This is always > because 1 1 C || 2 = maxK^j |Aj(C)|. □ 



Lemma 7.2 For any matrix C G C^J m with \\C\\ 2 < 1, e° ' ^ I + C + C 2 . 

Proof: The previous lemma implies that C" ^ C 2 for all i > 2. Property ( |7.2j ) of implies 
that for any k, 

k / k \ 

i + c + ^2^-^i + c+I^2^)c 2 ^i + c + c 2 . 

8=2 ' \i=2 '/ 

Now let & / +00 and use fiT\\). □ 

The next step is an exponential inequality for martingales. 

Lemma 7.3 (Exponential inequality for martingales) Let Z n , W n he as in Theorem 
with M = 1. Then for all s G [0, 1/2] and all deterministic C G C^J m , 

E [Tr [exp (sZ n - 2s 2 W n + C)]] < Tr [exp (C)] . 

Proof: Set X n = Z n — Z n _\ and A n = E [X 2 \ T n -i\ . We use Golden Thompson ([7.7D to deduce 
that: 

Tr(e sZn_2s2vKn+c ) < Tr(e sX " _2s2An e sZn ~ 1 ~ 2s2vKn " 1+c ) 



Taking conditional expectations, we see that: 



E 



Trie 



sZ n -s 2 W n +C\ 



< E 
= Tr(E 



-, i )A„-2s 2 A n ^sZ n _i~2s 2 W„-i+C- 

sZ n _i-2s 2 W n _ 1 +C> 



sX n -2s 2 A n I X- 
t- -/ n— 1 



Here the equality is a result of Tr and expected values commuting (2.3), as well as noting that 
e sX n ^ 1 -2s 2 w n ^i+c j g < 7^ l _ 1 _measurable and then applying fl2,4| ) to the conditional expectation. 
We now make the following claim. 

sX n -2s 2 A r , 



Claim 7.1 E 



-< I. 



This will imply (via monotonicity of the trace ( |7.4[) ) that: 



E 



Tr(e sZn ~ 2s2Wn+c ) I T n -\ < Tr(e sZn - 1_2s2w/,l - 1+c 



hence 



E 



Tr(e 



>Z n -2s 2 W n +C\ 



< E 



Tr(e 



sZ n _i-2s 2 W„_i+C\ 
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and the Lemma follows from this via induction in n. 
To prove the claim, we first note that for |s| < 1/2, 

|| g X TO -2, 2 A ra || 2 < l|X " l|2 + l|A "" 2 <1 



by the assumption that H-^nlh < 1- We now apply Lemma 7.2 with C = sX n — s 2 A n and the 
monotonicity of conditional expectations ( |7.4|) to obtain: 



E 



sX n -2s 2 A n I T 



=< E [/ + sX n - 2s 2 A n + s 2 X 2 n - 2s 3 X n A n - 2s 3 X n A n + As 4 A 2 n \ .F n _i] 



A n = E [X 2 | Jvj-i] is J^-i-measurable and the martingale property implies E [X n \ F n -\\ = 
0. Via equation ( p.4[ ), this implies E [A n X n \ F n -\] = E [X n A n \ T n ~\] = almost surely. This 
means that the RHS above is a.s. equal to: 

/ - s 2 A n + As A A 2 n . 

Now notice that the eigenvalues of — s 2 A n + 4s 4 A 2 are given by: 

-s 2 X t (A n ) + As 4 Xi(A n ) 2 , l<i<d. 

The inequality s < 1/2 implies 4s 4 < s 2 . Moreover, each Aj(A n ) is between and 1, since 
II An || < 1 and A n y (it is the conditional expectation of X 2 ). This implies that the above 
expression is at most: 

-s 2 \i(A n ) + s 2 \i{A n ) = 



for each i. Therefore, — s 2 A n + 4s 4 A 2 < and (again using the monotonicity property (7.5)) 



E 



e sX n -2s 1 A n I y 



n-1 



-< I almost surely. □ 

Proof: [of Theorem 1.2| One may assume that M = 1 (one can always rescale Z n so that this 
is the case; the bound behaves accordingly). If X maiX (W n ) < a 2 , a 2 I — W n y is positive 



semi-definite. Inequality ( |7.3| ) then implies that for all s > 0, 

X ma , x (sX n + 2s 2 a 2 I - 2s 2 W n ) > A max (sX n ) = sA max (X n ). 

Therefore, 

Vs > 0, P (A max (X n ) > t, A max (iy n ) < a 2 ) < P (A max (sX n + 2s 2 a 2 I - 2s 2 W n ) > st) 

< e~ st E [exp(A max (sX n + 2s V/ - 2s 2 W n ))] 
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We now use the inequality "e Amax ^ < Tr(e sZ )", valid for any s > and Z € C^J m (cf. (ffHD), 
together with the exponential inequality in Lemma |7\3| , to deduce that for all s 6 [0, 1/2], 

P (A max (sX n + 2sV/ - 2s 2 W n ) > at) < e' st E [Tr(exp(sX n + 2s 2 a 2 I - 2s 2 W n ))] 

< Tr(exp(2s 2 a 2 I))e- st = de 2 ^- 8 *. 

Set 

t 

S = 4(7 2 + 2t' 

Notice that with this choice s < 1/2 always. Moreover, 

2 2 2 ^ < ^ £^ 

S °" ~ 8ct 2 (1 + t/2<r 2 ) 2 ~ 8a 2 (l + t/2a 2 ) ~ 2 ' 

Hence: 

P (A m axPQ > t, A max (iy n ) < a 2 ) < de , 

as desired. □ 



Remark 7.1 It is well-known in the scalar case that inequalities for martingales imply inequal- 
ities for independent sums. The same is true in the matrix setting. Let X\, . . . , X n be mean-zero 
independent random matrices, defined on a common probability space (f^T 7 , P), with values in 
^Herm an( ^ suc ^ that there exists a M > with \\Xi\\ < M almost surely for all 1 < i < m. 
Letting = {0, £1} and Ti = o-(X\, . . . , Xi) (i € [n]), one can see that: 

i 

{{Zi = y^Xj.FjjYj^Q 

3=1 

is a martingale satisfying the assumptions of the Theorem and that, moreover, W n is determin- 
istic in this case: 



W n = [(Zi - Z;_!) 2 | Ji_ x ] =J2 E i X i 



-21 

i=l i=l 



Thus one may take: 



<T 2 = A max [J2 E i X : 



21 

i 

i=l 



m Theorem \1.Q and deduce the first half of the Corollary below. The other half comes from 
considering — Ya=i 



Corollary 7.1 Let X\, . . . ,X n be mean-zero independent random matrices, defined on a com- 

idxd 
'Herm 



mon probability space (Q, T, P), with values in C^^ m and such that there exists a M > with 



36 



— M almost surely for all 1 < i < m. Define: 
Then for all t > 0, 

P ( A max \J2 Xi ) ^ * ) - ^^^S 



vi=l 



and 



i=i 



8 Final remarks 



Sharpness of Theorem One can show that Theorem 1.2 is close to sharp and that, in 



particular, the d factor in the bound is necessary for general martingale sequences. To see this, 
consider a sum Z n of n independent, identically distributed d x d diagonal random matrices 
X\, . . . ,X n whose diagonal entries are independent, unbiased ±1. The largest eigenvalue of Z n 
is a maximum of d independent random sums, each with n terms of the kind ±1 above. One 
can see that for large n and d and for t ~ v n In d, 

P(A max (^ n )>i)>de-( 1+ °W) t2 / 2 " 

which is what Corollary [7j] gives up to the constants in the exponent. 

An interesting question is to understand the circumstances under which one can remove the 
d factor from the bound. For instance, can the sharper results of | 3ljj , 32] be reobtained via some 



variant of Theorem 1.2? 



Other applications of Theorem l.i . In a related paper (in preparation) we show how Theorem |1.2| 
can be used to show concentration of the matrices of random lifts of large graphs. A pleasing 
corollary of our result is this: consider a random fciA^-lift of a large graph G with minimum 
degree w(ln(fcifc2n)). The Laplacian of this lift is essentially indistinguishable from that of the 
(in principle very different) random graph obtained by performing a fci-lift on G and then a 
fc2-hft on the resulting graph. 



It would be interesting to see other applications of Theorem 1.2, especially in settings where 
the Christofides-Markstrom bound is useless because its variance term is too large (cf. Re- 
mark 3.1). 
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The Laplacian of inhomogeneous random graphs. The results of the Section |6| can be extended 
to the Laplacian C ntPtK of G niPtK . More precisely, add the following condition to Assumption 6.1: 



that there exists a if_ > such that for all x € [0,1], k{x) = n(x,y) dy > Then 
there is a close correspondence between £n iPiK and the operator S% = ld L 2 — T^, where Id^2 
is the identity operator on L 2 ([0, 1]) and Tg is the integral operator given by the symmetric, 
non-negative function: 

That is, if p < 1/K and pnK- S> Clnn for some C, we will have: 

— E n S^H n \\ = o (1) and \\H n C n ^ )K E n — S^\\ = o (1) , 
with consequences for the spectrum and eigenspaces of C npK . We omit the details. 



Better bounds and extensions? We have mentioned the results on spectral gaps in references 



[ |33| and 2S], on G ntP and random graphs with given expected degrees. These papers actually 
do much more than we described, as they show that, even is very sparse graphs, there is a large 
"core" set of vertices so that the matrices of the induced subgraph are well-behaved. It would 
be an interesting question to prove a similar result either for more general instances of bond 
percolation or inhomonegeous random graphs. 



Cut convergence, eigenvalues and eigenvectors. It is not clear to the author what one can/cannot 
prove about eigenvectors and eigenvalues of sparse graphs while only assuming that they converge 
to a given k in the cut norm. Ideally, one would wish to be able to prove that this suffices for 
the convergence of the given operators, at least under suitable assumptions, but it is not clear 
how one should proceed. 



A Appendix: two perturbation results 

The following functional-analytic perturbation results are needed in the main text. In what 
follows % is a real Hilbert space and || • || denotes both the Hilbert space norm and the induced 
norm on linear operators. Undefined notions and quoted results can be found in any textbook 



on Functional Analysis, eg. [55 



Lemma A.l Suppose V, W are compact Hermitian linear operators on the Hilbert space T~L that 
satisfy \\V — W[| < e. Let spec(V), spec(W) denote the spectra of V and W (respectively). Let 
S C R be such that inf sG s |s| > e and let my (5) be the sum of the multiplicities of all elements 
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of spec(V) n 5*. Then: 

my(S) < mw{S e ) 
where for A C R, A e = {x £ R : 3a G A, |x - a| < e}. 

Proof: This is evident if both V and W have finite-dimensional rank. In this case one may 
restrict to the span of the two ranges, which is a finite-dimensional space isomorphic to some 
R d , and then apply (|3,1|). [Do notice that might belong to the spectrum of the restriction of V 
or W to the finite-dimensional subspace, even though it does not belong to the original spectra. 
This, however, will not matter, due to the condition inf^s |s| > e.] 

For the case of infinite-dimensional rank, V and W are the limit (in the operator norm) of 



operators of finite-dimensional rank. More specifically, recall from Section 2.2 that the spectral 



theorem for compact, self-adjoint operators states that V can be written as a sum: 

oGspec(V) 

where the P a are orthogonal projectors of orthogonal ranges, with finite rank if a ^ 0. Moreover, 
for any 5 > 0, spec(V)\(— 5, 5) is finite. Therefore, the finite-rank operator: 

V 5 = aPa 

a€spec(V)\(-5,5) 

satisfies \\Vg — V\\ < 5. One may similarly define W$ with \\W$ — W\\ < 5 and it follows that 
\\V$ — Ws\\ < e + 25. Moreover, we have the simple fact: 

\/A C R\[-(5,<5], rn Vs (A) = m v {A) and m Ws (A) = m w (A). (A.l) 

Let 5 > be small, so that inf sg 5 \s\ > e + 38. The finite-dimensional result implies: 

m Vs (S) <m Ws (S e+2S ). 



Notice that mv s (S) = mv(S) because S C R\[— e, e] C R\[— 8, S] and therefore ( A.l ) applies. 
Moreover, \/x G S e+2S , 



\x\ > inf \s\ — e — 25 > 5 



by the choice of 5; therefore S e+2S C R\[— 8, 5] and we can apply ( |A.1| ) again to deduce that 
m Ws (S e+25 ) = m w {S e+2S ). These facts imply: 



my(S) < mw(S e 



-25\ 
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It is an exercise to show that mw{S e+2S ) mw(S e ) when S \ 0. This finishes the proof. □ 

Lemma A. 2 Suppose V, W are compact Hermitian linear operators on the Hilbert space T~L that 
satisfy \\V — W\\ < e. Assume that a < band 7 > e be such that a + 7 < b — 7 and V does not 
contain any eigenvalues in (a — 7, a + 7) U (b — 7, b + 7). Define H a ^(V) as the projector onto 
the span of the eigenvectors ofV corresponding to a < \k(V) < b and define Il a ^(W) similarly. 
Then: 

(b - a + 2 7 ) e 



\u atb (y)-u„ b (w)\\ < 



7r(7 2 — ye) 



Proof: Suppose first that H is finite-dimensional, in which case one may assume that 7~L = C d 
for some d and that V and W are matrices. In this case we use a standard technique involving 



contour integration in the complex plane and the resolvent of linear operators [41, Chapter 2]. 

Let C be the rectangular contour in the complex plane that passes through the points a + 
7V— T, a — 7\/— 1, b — y\J — 1, 6 + 7v— 1 in counterclockwise order. The Cauchy formula implies 
that for all A G M\{a,6}, 

cte 1, a < X < b 



2-K\J—l Jc z — A I 0, otherwise. 

Now consider the resolvent: 

R v {z) = (zl - V)~\ z G C\{Ai(V) : < i < n - 1}. 
The spectral theorem implies that: 



d-l 



R v (z) = £ 



fc,V 



where ipkV 1S the eigenvector of V corresponding to Afc(V). By assumption, V has no eigenvalues 
on C, therefore: 



\= f Ry( Z )dz = Y^-^= [ ^^f, dz = £ *k,Vp k , V = IIafi(y). 
J-tJC t^ 27T V-l JC z - Xk(V ) ,...^ r 



2vr 



fc=0 v ,vv ' fc:A fc (y)G[a,fe] 



Now define the resolvent R w (z) = (zl - W)' 1 . Recall that \\i(V) - Xi(W)\ < e < 7 by (|Q|) 
and that no eigenvalue of V lies in (a — 7, a + 7) U (b — 7, b + 7) (by assumption). This implies 
that no eigenvalue of W can lie on a or 6. Therefore, the same reasoning used above implies 
that: 

1= [ R w (z)dz = U a>b (W). 
y-i Jc 



2tta 



40 



In particular, 

\\n a>b (v)-u a>b (w)\\ 

It is not hard to show that: 
1 



(R v (z) - R w (z))dz 



(R v (z) - R w (z))dz 



2ttV-1 

Since C has length 2(6 — a) + 47, we have: 



<±- f \\R v {z)-R w (z)\\d\z\. 



|n a , b (F) - n a)6 (W)|| < (6 Q + 27) max - R w {z)\\. 

IT z£C 



(A.2) 



We now bound the difference between the resolvents. Recall that for T € <C dxd with ||T|| < 1, 

(I + T)- 1 = ^T n . 

n>0 

Suppose we can show that ||(W — y)itV(z)|| < a < 1 for z £ C. Then: 
\\R w (z) - R v (z)\\ = \\{( Z I-V)-(W-V))- 1 -R v (z)\\ 

= \\(zi - vy 1 (i-(w- v)(zi - vy 1 )- 1 ~ Rv(z)\\ 

= \\R v (z){(I-(W-V)R v (z))- 1 -I}\\ 
= \\Y J Rv{z)[{W-V)R v {z)T\\ 



n>l 



< 



\\Rv(z)\\Y;\\(W-V)R v (; 
< \\Rv(*)\\ 



n>l 

a 



1 — a 



But in our case we have: 



\\Rv(*)\\ 



d-l 

£ 

k=0 



k,V 



z - x k (Y) 



max|z - XkiV)^ 1 < I/7 

k 



because all Afe(V) lie within distance > 7 from the contour C (this follows from the assumption 
that no Afc(F) is in (a — 7, a + 7) U (b — 7, b + 7)). Moreover, \\W — V\\ < e by assumption. 
Therefore, \\(W - V)R v (z)\\ < e/7 < 1 and, by the above, 

\\R w (z) - R v (z)\\ < — . 

7 Z — 76 



Together with 



this finishes the proof for the finite-dimensional case. 
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We now consider the case of arbitrary %. Recall the definitions of V$ and W$ from the 
previous proof. It is easy to deduce from the definition of V$ that for any v £ <C d , 



and similarly 



U a>b (V)v=lhnU a>b (V 6 ) 



n a ,b(W) v = lim Ua tb (W s ) v where W 5 = ^ Vi A,wip*,w 

i--\*i\>8 



Since V$ and Wg have finite dimensional rank, one sees from the first part that for all small 
enough 5 > 0, 

|| (n a , b (^) - u a , b (w 5 )) v\\ < || V || \\u a , b (v 5 ) - u a , b (w s )\\ < |^±^1±^ 

since \\V$ — W$\\ < e + 25 < 7. Letting S \ implies: 

II (IMF) - n a , b (^)) ,|| < || V || 

and since v is arbitrary this finishes the proof. □ 
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